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Abstract 

Factorized Information Criterion (FIC) is a recently developed infor¬ 
mation criterion, based on which a novel model selection methodology, 
namely Factorized Asymptotic Bayesian (FAB) Inference, has been de¬ 
veloped and successfully applied to various hierarchical Bayesian models. 
The Dirichlet Process (DP) prior, and one of its well known representa¬ 
tions, the Chinese Restaurant Process (CRP), derive another line of model 
selection methods. FIC can be viewed as a prior distribution over the la¬ 
tent variable configurations. Under this view, we prove that when the 
parameter dimensionality Dc — 2, FIC is equivalent to CRP. We argue 
that when Dc > 2, FIC avoids an inherent problem of DP/CRP, i.e. the 
data likelihood will dominate the impact of the prior, and thus the model 
selection capability will weaken as Dc increases. However, FIC overesti¬ 
mates the data likelihood. As a result, FIC may be overly biased towards 
models with less components. We propose a natural generalization of FIC, 
which finds a middle ground between CRP and FIC, and may yield more 
accurate model selection results than FIC. 


1 Equivalence of FIC and CRP when D^. = 2 


Suppose there are a sequence of I-of-iC latent coding variables Z = Zi, • • • , z^. 
For any k, let rik = i Then Z corresponds to a partition of N numbers 
into K sets 5*1, •• • , Sk, where HiSfeU = Uk- This partition is denoted a.s B = 
(Si, • • • , Sk)- The correspondence between Z and the partition B is referred 
to as Z maps to B, denoted as Z B. 

A Chinese restaurant process assigns to this sequence a prior proba¬ 
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There are configurations of Z mapping to the same B. These con¬ 

figurations of Z form an equivalence class {Z\Z M- B}. When it is clear from 
context, we also denote B = {Z\Z B}. The probability of this equivalence 
class is: 


Pcrp(-B) 


m 


PiZo) 


1 

nnfc>0 


( 1 ) 


where is any configuration that maps to B. 

Note that K is a, free parameter. Fixing K to a particular value, we obtain 
a distribution of B conditioned on K: 
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where Zif = X: Vn.>o, n 

Y,n^=N 


is the normalizing constant. 


When Dc = 2, the FIC regularization term in [TJ eq.9] is 
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By comparing ([T|) and m, one can see that this regularizer term is equiv¬ 
alent to the CRP prior over the equivalence class when the model parameter 
dimensionality = 2. 


2 Stronger Model Selection of FIC when Dc > 2 

In higher dimensionality of Dc, Pfic{Z\K) ^ ^ Dc/ 2 , be. the FIC regular- 
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izer becomes sharper and more biased among different configurations of Z. To 
analyze the significance of the exponent I?c/2, suppose we use a prior p{Z) of 
Z in a model, where the data likelihood is given by p{X\Z,6). The posterior 
of Z is proportional to p{X\Z, 6)p{Z). 

We first suppose the configuration of Z is known, and consider a Laplace 
approximation oi p{X\Z,6) w.r.t. 9. When Dc is larger, p{X\Z,6) decreases 
more quickly when 9 deviates from the ML estimator (MLE) 9. That is, in 
p{X\Z,9)p{Z), if Dc is large enough, the effect of p{Z) will be dominated 
by p{X\Z,9), if the prior p{Z) does not change with Dc- In other words, the 
regularization brought about by the prior will weaken as Dc increases. The CRP 
prior is covered by this analysis. In contrast, as the FIC regularizer becomes 
sharper as Dc increases, the domination of p{X\Z, 9) over p{Z) will not happen 
with the increase of dimensionality of 9. This suggests that FIC will have a 
stronger model selection effect and tend to be more “parsimonious” than CRP, 
when the parameter dimensionality is high. 

3 Possible Limitations and Generalizations of FIC 

The analysis in Section 2 is based on Laplace approximation, in which the ap¬ 
proximating Gaussian is only accurate around a small area around the MLE 
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6 of p{X\Z,6). It might be the case that the estimated marginal proba¬ 
bility is greater than the actual probability. Actually during the derivation 
of Factorized Asymptotic Bayesian inference, [Tj assumes that the MLE 6 of 
is also the MLE ofp{X\Z, 6) for each particular Z. There¬ 
fore the estimated marginal probability would be greater than the actual marginal 
probability. 

In this regard, we could extend FIC to a Generalized FIG (GFIG), which 
is milder thanks to a smaller exponent, i.e. PGFic(-^|Ar) ^ where 

1 < d < Dc/2. 
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